A Distributed and Cooperative NameNode Cluster for a Highly-Available Hadoop Distributed File System
نویسندگان
چکیده
منابع مشابه
Optimistic Concurrency Control in a Distributed NameNode Architecture for Hadoop Distributed File System
The Hadoop Distributed File System (HDFS) is the storage layer for Apache Hadoop ecosystem, persisting large data sets across multiple machines. However, the overall storage capacity is limited since the metadata is stored in-memory on a single server, called the NameNode. The heap size of the NameNode restricts the number of data files and addressable blocks persisted in the file system. The H...
متن کاملDNN: A Distributed NameNode Filesystem for Hadoop
The Hadoop Distributed File System (HDFS) is the distributed storage infrastructure for the Hadoop big-data analytics ecosystem. A single node, called the NameNode of HDFS stores the metadata of the entire file system and coordinates the file content placement and retrieval actions of the data storage subsystems, called DataNodes. However the single Na-meNode architecture has long been viewed a...
متن کاملHyFS: A Highly Available Distributed File System
HyFS is designed to employ erasure codes to build a highly available distributed file system. It implements a general framework to use any erasure code. Thus, by applying different erasure codes, HyFS offers high flexibility for customizations to meet various application requirements.
متن کاملNameNode and DataNode Coupling for a Power-Proportional Hadoop Distributed File System
Current works on power-proportional distributed file systems have not considered the cost of updating data sets that were modified (updated or appended) in a low-power mode, where a subset of nodes were powered off. Effectively reflecting the updated data is vital in making a distributed file system, such as the Hadoop Distributed File System (HDFS), power proportional. This paper presents a no...
متن کاملGoogle File System and Hadoop Distributed File System - An Analogy
Big Data has indeed been the word which IT Industry is talking about lately. With advancement of automation and data being processed in real time, it has now become a necessity for companies to look forward to sustainable solutions to store their huge datasets and compute valuable information out of it. High performance computing heavily relies on distributed environments to process large chunk...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEICE Transactions on Information and Systems
سال: 2015
ISSN: 0916-8532,1745-1361
DOI: 10.1587/transinf.2014edp7258